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VLSI  for  High-Speed  Digital  Signal  Processing 


Quarterly  Progress  Report  —  10/1/92  through  12/31/92 


Programmable  Processor  Ring  Project 

During  the  past  quarter  we  have  fabricated  and  tested  a  12-bit  by  16-bit  multiplier 
based  on  our  previous  multiplier  architecture  but  using  third  order  recoding.  That 
is,  we  use  an  8-to-l  multiplexer  selected  by  3  input  bits  to  form  each  partial  product 
instead  of  the  4-to-l  multiplexer  selected  by  2  input  bits  as  used  previously.  The 
partial  products  (IX,  3X,  5X,  and  7X  the  coeflBcient  value)  are  stored  in  an  on-chip 
RAM.  Fig.  1  shows  the  architecture  of  the  test  IC.  The  multiplier  core  consists  of  5000 
transistors  occupying  a  chip  area  of  0.9mm^  (0.88  mm  by  1.05  mm)  in  1.2  //m  CMOS 
technology  and  was  simulated  to  operate  in  16  ns  (including  the  register  delays).  Of 
the  24  parts  we  received  from  MOSIS,  20  were  functional  with  worst-case  operating 
times  ranging  from  17.5  ns  to  19  ns  with  a  mean  of  18.175  ns.  The  distribution  for 
5V  and  3V  supply  voltages  is  shown  in  Fig.  2.  We  have  submitted  a  paper  to  the 
Midwest  Symposium  on  Circuits  and  Systems  detcdling  the  specific  implementation 
issues  of  our  12-bit  by  11-bit  multipher  and  our  12-bit  by  16-bit  multiplier  as  weU  as 
our  test  results. 

We  have  also  completed  the  design  amd  layout  of  the  ALU  for  the  five  processor 
ring.  SPICE  simulation  results  show  that  the  ALU  performance  is  limited  by  the 
multiplier  (as  expected)  and  therefore  we  expect  the  overall  ring  of  five  processors 
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to  operate  at  40MHz  in  2-^m  CMOS  technology.  We  axe  currently  integrating  the 
ALU  with  the  coefficient  and  program  memory  blocks  to  form  a  processor  and  expect 
to  submit  to  MOSIS  for  fabrication  a  chip  containing  a  complete  processor  and  two 
register  blocks  by  the  end  of  this  quarter. 

A  Programmable  Digital  Signal  Processor  Using  Switchable  Unit-Delays  for  Optimal 
Coefficient  Allocation 

A  novel  switchable  unit-delay  has  been  developed  for  the  efficient  implementation 
of  progrcunmable  digital  FIR  filters  and  correlators  using  the  well-known  canonical 
signed  digit  (CSD)  approach.  Our  design  enables  high-speed  processing  while  avoiding 
the  severe  hardware  inefficiency  that  would  result  from  straightforward  programmable 
tap  implementations  that  were  reported  previously  [1,  2,  3].  (In  a  straightforward 
implementation  many  filter-tap  “multipliers”  would  significantly  waste  valuable  com¬ 
putational  resources  since  all  taps  of  a  programmable  structure  would  need  to  accom¬ 
modate  “difficult”  coefficient  values,  while  for  any  specific  transfer  function  most  taps 
would  not  require  such  extreme  capabilities.)  The  switchable  unit-delay  not  only  al¬ 
lows  the  programming  of  the  number  of  filter  taps  imd  the  specific  filter-tap  coefficient 
values,  it  provides  the  capability  for  programming  the  optimal  allocation  of  hardware 
resources  to  each  tap.  The  switchable  unit-delay  together  with  a  2-digit  coefficient 
multiplier  forms  a  p-tap  as  shown  in  Fig.  3.  The  switchable  unit  delay  can  be  pro¬ 
grammed  to  be  either  a  register  or  a  pass-through  buffer.  By  suitable  programming 
of  these  switchable  unit-delays,  filter  taps  with  arbitrary  coefficient  precision  cam  be 
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of  these  switchable  unit-delays,  filter  taps  with  arbitraury  coefficient  precision  can  be 


obtained  as  illustrated  in  Fig,  4. 

We  have  implemented,  and  submitted  to  MOSIS  for  fabrication,  the  new  archi¬ 
tecture  in  a  prototype  chip  capable  of  realizing  a  broad  spectrum  of  lineaj-phase  FIR 
filters  employing  up  to  32  taps.  This  chip  will  provide  16-bit  input  and  output  data 
with  20-bit  internal  precision  and  will  operate  at  data  rates  as  high  as  175  MHz  (sim¬ 
ulated)  in  a  die  size  of  5.9mm  by  3.4mm  using  L2^m  CMOS  technology.  The  block 
diagram  of  the  chip  is  shown  in  Fig.  5.  To  achieve  high  processing  speed,  carry-save 
addition  is  implemented  with  transmission  gate  adders  (Fig.  6)  within  each  p-tap.  A 
vector  merge  adder  (VMA)  using  pipelined  ripple  adders  is  then  used  to  merge  the 
carry  and  sum  bits  at  the  output  of  the  last  p-tap.  The  switchable  unit-delay  ^Fig.  7) 
is  implemented  efficiently  with  a  modified  true  single  phase  latch  [4].  The  chip  (Fig. 
8)  was  designed  using  the  Mentor  Graphics  GDT  VLSI  CAD  tools.  We  have  written  a 
silicon  compiler  in  the  Genie  language  to  assemble  the  chip  with  pcirameterized  word 
length  and  number  of  taps. 
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Figure  1  -  Block  Diagram  of  12-bit  by  16-bit  Multiplier  Test  IC 
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Figure  2  -  Distribution  of  12-bit  by  16-bit  Multiplier  Worst-Case  Delay 
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Fig.  4:  Filter  taps  progieinimed  with  different  coefficient  digits. 


Fig.  5:  Block  diagram  of  the  progranunaable  FIR  chip. 


Fig.  6:  Schematic  of  the  transmission  gate  adder. 


Fig.  7;  Schematic  of  the  switchable  unit-delay  register. 

TABLE  I 

SUMMARY  OF  THE  PROTOTYPE  CHIP 
Maximum  FIR  order  32 

Technology  1.2fim  CMOS 

single  poly  double  metal 
Die  size  (with  pads)  5.9mm  x  3.4mm 
Input  word  length  16-bit 

Output  word  length  16-bit 

Coefficient  word  length  16-bit 

Internal  word  length  20-bit 

Number  of  pins  84 

Maximum  data  rate  175  MHz  (simulated) 


Fig.  8;  Layout  for  the  32-tap  programmable  FIR  filter. 


